Evaluating and automating the annotation of a learner corpus
نویسندگان
چکیده
The paper describes a corpus of texts produced by non-native speakers of Czech. We discuss its annotation scheme, consisting of three interlinked tiers, designed to handle a wide range of error types present in the input. Each tier corrects different types of errors; links between the tiers allow capturing errors in word order and complex discontinuous expressions. Errors are not only corrected, but also classified. The annotation scheme is tested on a data set including approx. 175,000 words with fair inter-annotator agreement results. We also explore the possibility of applying automated linguistic annotation tools (taggers, spell checkers and grammar checkers) to the learner text to support or even substitute manual annotation.
منابع مشابه
Hedges in English for Academic Purposes: A Corpus-based study of Iranian EFL learners
Hedges, as tools to express tentativeness and doubt, have been studied in plenty of research papers in the Iranian EFL research setting. However, their use in a learner corpus, portraying Iranian learner English, is in need of more research attention. With this end in view, this study aimed at investigating how Iranian EFL learners who have majored in English-related fields in Iran deployed hed...
متن کاملMetadiscourse Markers in a Corpus of Learner Language: The Case of Iranian EFL Learners
Different issues have been probed in learner corpus research since the late 1980s.However, taking the im- portance of meta discourse markers (MDMs) in signposting academic discourse, their use in Iranian EFL learners‟ academic essays is an area of research in need of a more serious analysis. Contributing to this line of investigation, this paper reports a corpus-based study of the use of MDMs i...
متن کاملError-Tagged Learner Corpus of Czech
The paper describes a learner corpus of Czech, currently under development. The corpus captures Czech as used by nonnative speakers. We discuss its structure, the layered annotation of errors and the annotation process.
متن کاملCreation and Analysis of a Reading Comprehension Exercise Corpus: Towards Evaluating Meaning in Context
We discuss the collection and analysis of a cross-sectional and longitudinal learner corpus consisting of answers to reading comprehension questions written by adult second language learners of German. We motivate the need for such task-based learner corpora and identify the properties which make reading comprehension exercises a particularly interesting task. In terms of the creation of the co...
متن کاملCreating a manually error-tagged and shallow-parsed learner corpus
The availability of learner corpora, especially those which have been manually error-tagged or shallow-parsed, is still limited. This means that researchers do not have a common development and test set for natural language processing of learner English such as for grammatical error detection. Given this background, we created a novel learner corpus that was manually error-tagged and shallowpar...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Language Resources and Evaluation
دوره 48 شماره
صفحات -
تاریخ انتشار 2014